Assessing the Use of Terminology in Phrase-Based Statistical Machine Translation for Academic Course Catalogues Translation
نویسندگان
چکیده
English. In this contribution we describe an approach to evaluate the use of terminology in a phrase-based machine translation system to translate course unit descriptions from Italian into English. The genre is very prominent among those requiring translation by universities in European countries where English is not a native language. Two MT engines are trained on an in-domain bilingual corpus and a subset of the Europarl corpus, and one of them is enhanced adding a bilingual termbase to its training data. Overall systems’ performance is assessed through the BLEU score, whereas the f-score is used to focus the evaluation on term translation. Furthermore, a manual analysis of the terms is carried out. Results suggest that in some cases despite the simplistic approach implemented to inject terms into the MT system the termbase was able to bias the word choice of the engine. Italiano. Nel presente lavoro viene descritto un metodo per valutare l’uso di terminologia in un sistema PBSMT per tradurre descrizioni di unità formative dall’italiano in inglese. La traduzione di questo genere di testi è fondamentale per le università di Paesi europei dove l’inglese non è una lingua ufficiale. Due sistemi di MT vengono addestrati su un corpus in-domain e un sottoinsieme del corpus Europarl. Ad uno dei due sistemi viene aggiunto un glossario bilingue. La valutazione delle prestazioni globali dei sistemi avviene tramite BLEU score, mentre f-score usato per la valutazione specifica della traduzione dei termini. È stata inoltre condotta un’analisi manuale dei termini. I risultati evidenziano che, nonostante il metodo elementare utilizzato per inserire i termini nel sistema di MT, il termbase in alcuni casi in grado di infuenzare la scelta dei termini nell’output.
منابع مشابه
Enhancing Machine Translation of Academic Course Catalogues with Terminological Resources
This paper describes an approach to translating course unit descriptions from Italian and German into English, using a phrase-based machine translation (MT) system. The genre is very prominent among those requiring translation by universities in European countries in which English is a non-native language. For each language combination, an in-domain bilingual corpus including course unit and de...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملA Comparative Study of English-Persian Translation of Neural Google Translation
Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...
متن کاملBilingual Termbank Creation via Log-Likelihood Comparison and Phrase-Based Statistical Machine Translation
Bilingual termbanks are important for many natural language processing (NLP) applications, especially in translation workflows in industrial settings. In this paper, we apply a log-likelihood comparison method to extract monolingual terminology from the source and target sides of a parallel corpus. Then, using a Phrase-Based Statistical Machine Translation model, we create a bilingual terminolo...
متن کاملAccuracy-Based Scoring for Phrase-Based Statistical Machine Translation
Although the scoring features of state-of-theart Phrase-Based Statistical Machine Translation (PB-SMT) models are weighted so as to optimise an objective function measuring translation quality, the estimation of the features themselves does not have any relation to such quality metrics. In this paper, we introduce a translation quality-based feature to PBSMT in a bid to improve the translation ...
متن کامل